Markov Decision Processes with Unobserved Confounders: A Causal Approach
نویسندگان
چکیده
Markov decision processes (MDPs) constitute one of the most general frameworks for modeling decision-making under uncertainty, being used in multiple fields, including economics, medicine, and engineering. The goal of the agent in an MDP setting is to learn more about the environment so as to optimize a certain criterion. This task is pursued through the exploration of the environment by actively performing interventions (i.e., through the randomization of its actions), which contrasts with the agent passively observing the environment and not exerting any control over it (i.e., through random sampling). The existence of unobserved confounders, namely, unmeasured variables affecting both the action and the outcome or both the action and the state variables, implies that these two data-collection modes (passive and active) will in general not coincide. It is clear that by performing interventions, any potential inclination (intuition) of the agent will be ignored, which will imply a loss of information and failure to achieve an optimal behavior. In this paper, we formalize this observation and study its conceptual and algorithmic implications. We first demonstrate that standard algorithms may act sub-optimally when unobserved confounders are present. We then propose a systematic method to enhance these algorithms using causal inference theory and leveraging observational data. We formally and empirically show that this new approach produces superior results than current state-of-the-art MDP algorithms.
منابع مشابه
Bandits with Unobserved Confounders: A Causal Approach
The Multi-Armed Bandit problem constitutes an archetypal setting for sequential decision-making, permeating multiple domains including engineering, business, and medicine. One of the hallmarks of a bandit setting is the agent’s capacity to explore its environment through active intervention, which contrasts with the ability to collect passive data by estimating associational relationships betwe...
متن کاملThe Control Outcome Calibration Approach for Causal Inference With Unobserved Confounding
Unobserved confounding can seldom be ruled out with certainty in nonexperimental studies. Negative controls are sometimes used in epidemiologic practice to detect the presence of unobserved confounding. An outcome is said to be a valid negative control variable to the extent that it is influenced by unobserved confounders of the exposure effects on the outcome in view, although not directly inf...
متن کاملFads Models with Markov Switching Hetroskedasticity: decomposing Tehran Stock Exchange return into Permanent and Transitory Components
Stochastic behavior of stock returns is very important for investors and policy makers in the stock market. In this paper, the stochastic behavior of the return index of Tehran Stock Exchange (TEDPIX) is examined using unobserved component Markov switching model (UC-MS) for the 3/27/2010 until 8/3/2015 period. In this model, stock returns are decomposed into two components; a permanent componen...
متن کاملSplit-door criterion for causal identification: Automatic search for natural experiments
Unobserved or unknown confounders complicate even the simplest attempts to estimate the effect of one variable on another using observational data. When cause and effect are both affected by unobserved confounders, methods based on identifying natural experiments have been proposed to eliminate confounds. However, their validity is hard to verify because they depend on assumptions about the ind...
متن کاملAccelerated decomposition techniques for large discounted Markov decision processes
Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016